Learning Semantic Sub-graphs for Document Summarization
نویسندگان
چکیده
In this paper we present a method for summarizing document by creating a semantic graph of the original document and identifying the substructure of such a graph that can be used to extract sentences for a document summary. We start with deep syntactic analysis of the text and, for each sentence, extract logical form triples, subject–predicate–object. We then apply cross-sentence pronoun resolution, co-reference resolution, and semantic normalization to refine the set of triples and merge them into a semantic graph. This procedure is applied to both documents and corresponding summary extracts. We train linear Support Vector Machine on the logical form triples to learn how to extract triples that belong to sentences in document summaries. The classifier is then used for automatic creation of document summaries of test data. Our experiments with the DUC 2002 data show that increasing the set of attributes to include semantic properties and topological graph properties of logical triples yields statistically significant improvement of the micro-average F1 measure for the extracted summaries. We also observe that attributes describing various aspects of semantic graph are weighted highly by SVM in the learned model.
منابع مشابه
Impact of Linguistic Analysis on the Semantic Graph Coverage and Learning of Document Extracts
Automatic document summarization is a problem of creating a document surrogate that adequately represents the full document content. We aim at a summarization system that can replicate the quality of summaries created by humans. In this paper we investigate the machine learning method for extracting full sentences from documents based on the document semantic graph structure. In particular, we ...
متن کاملReconciling Event-Based Knowledge Through RDF2VEC
The reconciled knowledge graphs are typically used for multidocument summarization, or to detect knowledge evolution across document series. This paper focuses on reconciling knowledge graphs generated from two text documents about similar events described differently. Our approach employs and extends MERGILO, a tool for reconciling knowledge graphs extracted from text, using word similarity an...
متن کاملGraph-Based Multi-Modality Learning for Topic-Focused Multi-Document Summarization
Graph-based manifold-ranking methods have been successfully applied to topic-focused multi-document summarization. This paper further proposes to use the multi-modality manifold-ranking algorithm for extracting topic-focused summary from multiple documents by considering the within-document sentence relationships and the cross-document sentence relationships as two separate modalities (graphs)....
متن کاملTwo-tier Architecture for Domain Specific Document Summarization Using Probabilistic Latent Semantic Analysis
In this research work we have proposed two-tier architecture for document summarization. This architecture minimizes the redundancy and boosts the information relevancy in the summary by applying Probabilistic Latent Semantic Analysis (PLSA) at two levels. It also enhances the summarizer’s speed by using Incremental Expectation Maximization algorithm for PLSA learning rather than Expectation Ma...
متن کاملSemantic Graphs Derived From Triplets with Application in Document Summarization
Information nowadays has become more and more accessible, so much as to give birth to an information overload issue. Yet important decisions have to be made, depending on the available information. As it is impossible to read all the relevant content that helps one stay informed, a possible solution would be condensing data and obtaining the kernel of a text by automatically summarizing it. We ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004